Skip to main content

View Post [edit]

Poster: Nemo_bis Date: Nov 3, 2013 11:59am
Forum: etree Subject: Wayback machine doesn't support the "Range" header AKA wget --continue doesn't work

It's still not working fully, in particular on the Wayback machine: web.archive.org sometimes contains big files, but stops sending them just a few bytes short of 100 MiB: "Connection closed at byte 104857347", says wget. You can retry at will and in few seconds/minutes you get 100 MiB more... but the same chunk.
The wget docs say «Note that -c only works with FTP servers and with HTTP servers that support the "Range" header», indeed it seems web.archive.org doesn't support it. In my test I see that the response to the second request is a HTTP/1.1 200 OK, not HTTP/1.0 206 Partial Content; it doesn't change if I interrupt the download and resume it manually instead of letting wget retry; the partial file does exist in the directory.

I also tried accepting gzip per some comment on the web, see full output.

$ wget --continue --header "Accept-Encoding: gzip" --tries=0 -S http://web.archive.org/web/20070720040924/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
--2013-11-03 20:02:21-- http://web.archive.org/web/20070720040924/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
Resolving web.archive.org (web.archive.org)... 207.241.224.26
Connecting to web.archive.org (web.archive.org)|207.241.224.26|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 302 Moved Temporarily
Server: Tengine/1.5.1
Date: Sun, 03 Nov 2013 20:02:21 GMT
Content-Type: video/x-msvideo
Transfer-Encoding: chunked
Connection: keep-alive
set-cookie: wayback_server=74; Domain=archive.org; Path=/; Expires=Tue, 03-Dec-13 20:02:21 GMT;
Link: ; rel="original"
Location: /web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
X-Archive-Wayback-Perf: [IndexLoad: 9, IndexQueryTotal: 9, RobotsFetchTotal: 2, RobotsRedis: 2, RobotsTotal: 2, Total: 14]
Set-Cookie: wb_total_perf=14; Expires=Sun, 03-Nov-2013 20:03:21 GMT; Path=/web/20070720040924/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
X-Archive-Playback: 0
X-Page-Cache: MISS
Location: /web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi [following]
--2013-11-03 20:02:21-- http://web.archive.org/web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
Reusing existing connection to web.archive.org:80.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: Tengine/1.5.1
Date: Sun, 03 Nov 2013 20:02:21 GMT
Content-Type: video/x-msvideo
Content-Length: 288092160
Connection: keep-alive
Memento-Datetime: Fri, 10 Aug 2007 11:30:28 GMT
Link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first last memento"; datetime="Fri, 10 Aug 2007 11:30:28 GMT"
X-Archive-Orig-Connection: close
X-Archive-Orig-Content-Length: 288092160
X-Archive-Orig-Content-Type: video/x-msvideo
X-Archive-Orig-ETag: "5d5c-112bf000-4015cefd252c0"
X-Archive-Orig-Server: Apache
X-Archive-Orig-Accept-Ranges: bytes
X-Archive-Orig-Last-Modified: Thu, 22 Sep 2005 14:16:19 GMT
X-Archive-Orig-Date: Fri, 10 Aug 2007 11:30:28 GMT
X-Archive-Wayback-Perf: [IndexLoad: 6, IndexQueryTotal: 6, RobotsFetchTotal: 2, RobotsRedis: 2, RobotsTotal: 2, Total: 31, WArcResource: 24]
Set-Cookie: wb_total_perf=31; Expires=Sun, 03-Nov-2013 20:03:21 GMT; Path=/web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
X-Archive-Playback: 1
X-Page-Cache: MISS
Length: 288092160 (275M) [video/x-msvideo]
Saving to: `Wikimania05-AP1.avi'

36% [====================================> ] 104,857,347 --.-K/s in 1m 45s

2013-11-03 20:04:07 (976 KB/s) - Connection closed at byte 104857347. Retrying.

--2013-11-03 20:04:08-- (try: 2) http://web.archive.org/web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
Connecting to web.archive.org (web.archive.org)|207.241.224.26|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: Tengine/1.5.1
Date: Sun, 03 Nov 2013 20:04:08 GMT
Content-Type: video/x-msvideo
Content-Length: 288092160
Connection: keep-alive
Memento-Datetime: Fri, 10 Aug 2007 11:30:28 GMT
Link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first last memento"; datetime="Fri, 10 Aug 2007 11:30:28 GMT"
X-Archive-Orig-Connection: close
X-Archive-Orig-Content-Length: 288092160
X-Archive-Orig-Content-Type: video/x-msvideo
X-Archive-Orig-ETag: "5d5c-112bf000-4015cefd252c0"
X-Archive-Orig-Server: Apache
X-Archive-Orig-Accept-Ranges: bytes
X-Archive-Orig-Last-Modified: Thu, 22 Sep 2005 14:16:19 GMT
X-Archive-Orig-Date: Fri, 10 Aug 2007 11:30:28 GMT
X-Archive-Wayback-Perf: [IndexLoad: 10, IndexQueryTotal: 10, RobotsFetchTotal: 5, RobotsRedis: 5, RobotsTotal: 5, Total: 87, WArcResource: 74]
Set-Cookie: wb_total_perf=87; Expires=Sun, 03-Nov-2013 20:05:08 GMT; Path=/web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi
X-Archive-Playback: 1
X-Page-Cache: MISS
Length: 288092160 (275M) [video/x-msvideo]
Saving to: `Wikimania05-AP1.avi'

36% [====================================> ] 104,857,347 --.-K/s in 85s

2013-11-03 20:05:33 (1.18 MB/s) - Connection closed at byte 104857347. Retrying.

$ wget --version
GNU Wget 1.13.4 built on linux-gnu.

+digest +https +ipv6 +iri +large-file +nls +ntlm +opie +ssl/openssl

Wgetrc:
/etc/wgetrc (system)
Locale: /usr/share/locale
Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib
-I../../lib -D_FORTIFY_SOURCE=2 -Iyes/include -g -O2
-fstack-protector --param=ssp-buffer-size=4 -Wformat
-Wformat-security -Werror=format-security -DNO_SSLv2
-D_FILE_OFFSET_BITS=64 -g -Wall
Link: gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat
-Wformat-security -Werror=format-security -DNO_SSLv2
-D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-Bsymbolic-functions
-Wl,-z,relro -Lyes/lib -lssl -lcrypto -lz -ldl -lz -lidn -lrt
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a